# Reinforcement Learning Fine-Tuning
Deductive Reasoning Qwen 32B
MIT
A model trained through reinforcement fine-tuning based on Qwen 2.5 32B Instruct, specifically designed to solve challenging deductive reasoning problems in the Temporal Clue dataset.
Large Language Model
Transformers English

D
OpenPipe
1,669
39
Tifa DeepsexV2 7b MGRPO Safetensors GGUF
Apache-2.0
Tifa-DeepsexV2-7b-MGRPO-safetensors is a multilingual (Chinese and English) large language model based on the transformers library, optimized through incremental pre-training, supervised fine-tuning, and reinforcement learning, suitable for role-playing and chain-of-thought tasks.
Large Language Model Supports Multiple Languages
T
mradermacher
283
1
Featured Recommended AI Models